This document details visualization in anvio
Anvio is run in a dedicated environment.
conda activate anvio-7.1
Get bin info into a format that anvio can use. This means concatenating the bin files for each method, so there’s a list of which contig/read goes in which bin
path <- list.dirs("../data/Bins")
for (i in 2:7){
DF <- NULL
pathname <- path[i]
filelist <- list.files(paste0(pathname, "/"))
for (filename in filelist){
df <- read.csv(paste0(pathname, "/", filename), header = F)
df <- as.data.frame(df)
colnames(df) <- "read"
df$bin <- str_replace(filename, "[.]", "_")
if (basename(pathname) == "24_sample_bam_bins"){
df$bin <- str_replace(df$bin, "24", "twentyfour")
}
if (basename(pathname) == "47_sample_bam_bins"){
df$bin <- str_replace(df$bin, "47", "fortyseven")
}
DF <- rbind(DF, df)
}
write.table(DF, paste0("../output/all_bins/", basename(pathname), ".tsv"), row.names = F, col.names = F, quote = F, sep = "\t")
}
Examine tsv files.
tsv_output <- read.csv("../output/all_bins/assembly_bins.tsv", sep = "\t")
kable(head(tsv_output, 5))
| MG1058_s821.ctg000852l | assembly_bin_1 |
|---|---|
| MG1058_s1105.ctg001148l | assembly_bin_1 |
| MG1058_s1585.ctg001645l | assembly_bin_1 |
| MG1058_s1820.ctg001893l | assembly_bin_1 |
| MG1058_s645.ctg000674l | assembly_bin_10 |
| MG1058_s914.ctg000951l | assembly_bin_10 |
Get the bins into the anvio database already created.
# Example for one bin import, change import and -C for each
anvi-import-collection "./github/jordan-marinimicrobia/output/all_bins/short_reads_bam_bins.tsv" -p "./Downloads/plus_PROFILE.db" -c "./Library/CloudStorage/GoogleDrive-jwinter2@uw.edu/Shared drives/Rocap Lab/Project_ODZ_Marinimicrobia_Jordan/Anvio/assembly_plus/1058_P1_2018_585_0.2um_assembly_plus.db" --contigs-mode -C shortreads
anvi-interactive -p "./Downloads/plus_PROFILE.db" -c "./Library/CloudStorage/GoogleDrive-jwinter2@uw.edu/Shared drives/Rocap Lab/Project_ODZ_Marinimicrobia_Jordan/Anvio/assembly_plus/1058_P1_2018_585_0.2um_assembly_plus.db"
Example of what the interactive browser looks like with bins.
Anvio interactive browser
Dig into “contaminated” bins to see how/why they are contaminated. Reminder that “.” is changed to “_” and 24 and 47 are written out in the anvi bin database.
anvi-refine -p "./Downloads/assembly_PROFILE.db" -c "./Library/CloudStorage/GoogleDrive-jwinter2@uw.edu/Shared drives/Rocap Lab/Project_ODZ_Marinimicrobia_Jordan/Anvio/assembly_only/1058_P1_2018_585_0.2um_assembly.db" -C shortreads -b short_reads_bam_bin_163
Example of a contaminated bin.
Anvio interactive display of a contaminated bin
summary <- read.table("../output/anvio_outputs/assembly_plus_summary.txt", sep = "\t", header = T)
summary(summary)
## bins total_length num_contigs N50
## Length:274 Min. : 202581 Min. : 1.00 Min. : 10203
## Class :character 1st Qu.: 310692 1st Qu.: 9.00 1st Qu.: 12386
## Mode :character Median : 488296 Median : 23.00 Median : 14620
## Mean : 879731 Mean : 52.00 Mean : 74294
## 3rd Qu.: 896170 3rd Qu.: 50.75 3rd Qu.: 50200
## Max. :20079188 Max. :1582.00 Max. :3034959
## GC_content percent_completion percent_redundancy t_domain
## Min. :26.47 Min. : 0.00 Min. : 0.00 Length:274
## 1st Qu.:38.99 1st Qu.: 0.00 1st Qu.: 0.00 Class :character
## Median :45.84 Median : 0.00 Median : 0.00 Mode :character
## Mean :47.44 Mean : 13.16 Mean : 16.19
## 3rd Qu.:57.02 3rd Qu.: 23.59 3rd Qu.: 0.00
## Max. :69.50 Max. :100.00 Max. :2053.52
## t_phylum t_class t_order t_family
## Length:274 Length:274 Length:274 Length:274
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## t_genus t_species
## Length:274 Length:274
## Class :character Class :character
## Mode :character Mode :character
##
##
##
Creating anvio dbs for my bins
anvi-gen-contigs-database -f sulf_genomes/assembly_plus_bin_4.fa -o sulfbin4.db
anvi-run-hmms -c sulf_genomes/dbs/sulfbin4.db
anvi-run-scg-taxonomy -c sulf_genomes/dbs/sulfbin4.db
anvi-scan-trnas -c sulf_genomes/dbs/sulfbin4.db
anvi-run-ncbi-cogs -c sulf_genomes/dbs/sulfbin4.db
anvi-run-kegg-kofams -c sulf_genomes/dbs/sulfbin4.db
anvi-gen-genomes-storage -e sulf-external-genomes.txt \
-o sulf-GENOMES.db
anvi-pan-genome -g sulf-GENOMES.db -n sulfitobacter
anvi-display-pan -g sulf-GENOMES.db -p sulfitobacter/sulfitobacter-PAN.db
Pangenome visualization.
Sulfitobacter pangenome